Pedagogic challenges 2 urban systems science - Pedagogic challenges 2: urban systems science

Talk overview

Discipline background

Part 1: Urban systems science?
Part 2: Spatial data

Pedagogic challenges

Part 3: Data contamination / manipulation
Part 4: Big data
Part 5: Reproducibility
Part 6: Teaching criticality, data bias, reproducibility

Who am i

Lecturer in Spatial Data Science and Visualization at CASA, UCL
Lead MSc modules in:
- Geographic information systems and science
- Remotely sensing cities and environments
Research:
- Applications of data for city decisions / sustainability
- Big data for allocating funding

Part 1: Urban systems science ?

What do we mean

Urban systems

A set of towns and cities [or functions within cities] that can be considered linked together by various forms of social and economic interaction

Source: Oxford reference

Systems thinking

Methods aimed at studying a system through its collective behavioral features

Source: Cristiano et al. 2020

Tools for Systems Thinkers: The 6 Fundamental Concepts of Systems Thinking

Science of cities

The science of cities – using evidence to understand how cities work – is forever expanding

Source: UK Government

Urban science

Urban science is an interdisciplinary field that studies diverse urban issues and problems

Source: Wikipedia

Urban systems science

Urban Systems: Cities [or functions within cities] that can be considered linked together [there is a relationship between them]

+

Urban systems science

Urban Systems: Cities [or functions within cities] that can be considered linked together [there is a relationship between them]

+

Urban Science: Urban issues and problems

Urban systems science

Urban Systems: Cities [or functions within cities] that can be considered linked together [there is a relationship between them]

+

Urban Science: Urban issues and problems

=

Smart Cities: networks and services are made more efficient with the use of digital solutions for the benefit of its inhabitants and business.

Source: Smart Cities, European Comission

Urban system science approach:

Updated from Grolemund & Wickham's classis R4DS schematic, envisioned by Dr. Julia Lowndes for her 2019 useR! keynote talk and illustrated by Allison Horst. Source: Allison Horst data science and stats illustrations

The same as regular data science but with spatial data

An example..Urban Heat Island effect

Fremantle Woolstore, Western Australia

An example….UHI

An example….UHI

Ran 4 scenarios:

Original (existing) development (from satellite imagery)
Proposed redevelopment as in the plan
Proposed redevelopment removing trees
Proposed redevelopment with trees covering the hottest pixels

How smart are cities?

Part 2: Spatial data

What is spatial data?

The earth is a 3D sphere (well, almost). It’s wider than it is tall
In order to locate a point on the surface of a sphere, we need a set of coordinates
Coordinates will tell us how near to the top or bottom of the sphere we are, or how far around
But where do we start?

What is spatial data 2?

Geographic Coordinate Reference System

treats the globe as if it was a sphere divided into 360 equal parts called degrees

Projected Coordinate Reference System

flat, two-dimensional plane (through projecting a spheroid onto a 2D surface) giving it constant lengths, angles and areas

Simply

Spatial data is just like normal data except it has an extra “geometry column”

Pedagogic challenges

Part 3: Data contamination / manipulation ?

Data contamination / manipulation

Ok, what about geographic data

Who has made our boundary data?

Who has made manipulated our boundary data?

Who has made our boundary data?

Redlining

1930s – American Home Owner’s Loan Corporation – prevent missed payments…residential security maps based on race
- People abandon areas
- Can’t refinance
- Less property tax for services
- Social equity issues remain
- 1968 Fair Housing Act

Who has made our boundary data?

Gerrymandering

Every 10 years electoral districts are re-drawn “redistricting”– Thomas Hofeller (republican) = PACK and CRACK

PACK = put all the democrat voters in 1 district
CRACK = sprinkle them out so they never have majority

“Redistricting is democracy at work” - Tom Hofeller

Pedagogic challenges

Part 4: Big Data

Big data

Big geospatial data include datasets that are too large to be processed using traditional GIS tools

Source: GIS Harvard

Why are they large?

Raster

Landsat satellite data: 400 scenes of Earth a day, revising each location every 16 days
- Each scene is about 1GB
- We’d used Google Earth Engine - not considered here

Vector

New York City Taxi and Limousine Commission (TLC) all records from Yellow and Green Cabs
- 150GB, 1.2 billion records
Open Street Map
- 1764.5GB when uncompressed

What can we do about it?

Parquet files

We are moving from row based storage to column based
About 50x faster than a .csv
It groups our data.
- For example a row group size of 2, puts rows all the data from 1 and 2 next to each other then we have 3! = GROUPS or PARTITION
- If we have large data this means we can skip groups we don’t need

New York City Taxi and Limousine Commission (TLC) all records from Yellow and Green Cabs

Concepts

You may come across Arrow - this is an in-memory format, Parquet is a storage format

In the R for Data Science book a 9BG .csv is queried in
- 11 seconds for standard code
- 0.063 seconds using a parquet file! 100x faster

We can go faster!

DuckDB

Database management system
Columnar data
No installation
Convert our Parquet file to DuckDB and back again!

to_duckdb() 
to_arrow()

Regarding performance, parquet is 717 times faster than the same query on a csv file, and duckdb is 2808 times faster.

Source: Christophe Nicault

Notes

All (parquet and DuckDB) make sure of dplyr ! select(), filter(), groupby() = direct integration with R
Currently the support for spatial data is very limited
sfarrow - can load and query the data but can’t do any analysis!

Postgres

Postgres = object-relational database

PostgreSQL has a PostGIS extention

This allows the “geometry” column and spatial quieres

Making random points in polygons

5 million random points

QGIS = 226 seconds
PostGIS = 18 seconds

Source: Why should you care about PostGIS? — A gentle introduction to spatial databases

PostGIS

Starting

Despite all these tools we must start with the basics.
Often this is in Quantum GIS (free) or ArcMap($)
We will be exploring QGIS in the workshop later

Pedagogic challenges

Part 5: Reproducibility

What led me here?

Lecture with Carl Howe

2017, 90% of the data in the world today has been created in the last two years alone, at 2.5 quintillion bytes of data a day! - IBM

Ok, what about geographic data

A shifting landscape

Paper: Opening practice: supporting reproducibility and critical spatial data science

A comparison of Geographical Weighted regression across:
- 4 open software packages
- 2 black box / commercial implementations

All of the implementations were tested with the same input data.

They all gave the same results except the ESRI/ArcGIS implementation (Li 2018)

and although ESRI provide help for the GWR tools, the actual coding is closed—the underlying code is not revealed

Source: Brunsdon and Comber, 2021

Part 6: Teaching criticality, data bias, reproducibility

1. Lead by example

1b. Listen to Alumni / employers

1c. Learn by doing

1. Don’t assess it, make it mandatory for the assessment*

1. Lead by example

Traditional labs and were distributed in pdfs, word documents and powerpoints.
Used ArcGIS 💰

1. Lead by example

1b. Listen to Alumni / employers

1c. Design and outputs

Learning happens by doing

Weekly homework that we dedicate time to discussing

Week 1-5 tasks
Week 6-9 practice exam

1c. Design and output

Part 1: GIS tools…subject based learning

You need calculate the average percent of science students (in all) grades per county meeting the required standards

Part 2: GIS analysis… problem based learning

Each practical answers a question….

What are the factors that might lead to variation in Average GCSE point scores across the city?

What are we assessing?

Can students apply the tools / methods with different scenarios and data ?

Can students critique the process

2. Make it mandatory for the assessment

Part 2: GIS analysis, example practice question

New York City wish to conduct a study that aims to prevent people being evicted through understand possible related factors.You have been enlisted as a consultant and tasked to conduct an analysis of their data from 2020.

Data:

2. Make it mandatory for the assessment

DISCUSS

How were the evictions recorded
Why were there limited evictions during 2020/ then a sudden peak? - COVID ban on evictions
How can identifying spatially related factors to evictions be useful
Are there certain areas that have higher evictions than others - why might this be?
What assumption does the data make
What assumptions does the model make

2. Make it mandatory for the assessment

Students

Click the URL and generates a new repository
Staff can see their work and when they make edits (commit / push)

Conclusion

It is essential to use data to inform decisions…BUT we must develop a critical awareness of:
- How the data has been created
- How the boundary data has been created
- What the agenda was for collecting the data
In addition we must recognize that:
- Data is a snapshot / sample of the population
- Analysis attempts to model the world - it is never perfect.

Scientists must have a say in the future of cities, McPhearson 2016